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Abstract —We propose a fast sequential algorithm for the 
fundamental problem of estimating frequencies and amplitudes 
of a noisy mixture of sinusoids. The algorithm is a natural 
generalization of Orthogonal Matching Pursuit (OMP) to the 
continuum using Newton refinements, and hence is termed 
Newtonized OMP (NOMP). Each iteration consists of two phases: 
detection of a new sinusoid, and sequential Newton refinements 
of the parameters of already detected sinusoids. The refinements 
play a critical role in two ways: (1) sidestepping the potential 
basis mismatch from discretizing a continuous parameter space, 
(2) proYidingfeedback for locally refining parameters estimated in 
previous iterations. We characterize convergence, and provide a 
Constant False Alarm Rate (GEAR) based termination criterion. 
By benchmarking against the Cramer Rao Bound, we show that 
NOMP achieves near-optimal performance under a variety of 
conditions. We compare the performance of NOMP with classical 
algorithms such as MUSIC and more recent Atomic norm Soft 
Thresholding (AST) and Lasso algorithms, both in terms of 
frequency estimation accuracy and run time. 

Index Terms —Gridless Compressed Sensing, Orthogonal 
matching pursuit. Sparse approximation. Line spectral esti¬ 
mation, Frequency estimation, Newton refinement. Decision- 
feedback. 


I. Introduction 


Frequency estimation from a mixture of sinusoids in AWGN 
is a fundamental problem that arises in a variety of communi¬ 
cation and radar applications, including estimation of spatial 
channels (e.g., for phased arrays), temporal multipath channels 
(e.g. for equalization), and spatiotemporal channels (e.g., range 
and direction of arrival estimation for a target). In all of these 
applications, the spectrum of the measured signal consists of 
multiple discrete frequencies over a continuous interval. The 
problem of estimating the frequencies of such a signal is also 
known as line spectral estimation. 

In this paper, we propose an algorithm to estimate fre¬ 
quencies from N equi-spaced noisy samples in time, denoted 
by y G C^. Defining the unit norm sinusoid of frequency 
w G [0, 27r), by 




„i(N-l)a;lT 


( 1 ) 


the observed signal is a mixture of K sinusoids; 


K 

y = X + z, (0, ctHn) , (2) 

1=1 


where gi G C are the unknown complex gains. The signal 
to noise ratio for sinusoid is given by SNR; = \gi\‘^/a‘^. 


The goal of the algorithm is to provide reliable estimates of 
{{gi,uJi) : I = 1, 2,..., K} and K, the number of sinusoids 
in the mixture. 

The preceding model and its variants have many applica¬ 
tions. For a linear array with N elements with inter-element 
spacing d, the response corresponding to angle of arrival or 
departure 6 relative to broadside is given by x(w), where 
uj = 2TT{d/X) sin(0) is the spatial frequency corresponding to 
9, and A denotes the carrier wavelength. For estimation of a 
multipath channel h{t) = ~ channel trans¬ 
fer function H{f) = J^iLi ■ Sampling uniformly 

in the frequency domain with spacing A/ yields a mixture 
of sinusoids with toi = — 27rA/ri, reducing the problem of 
estimating delays to that of frequency estimation. This directly 
models the operation of stepped frequency continuous wave 
(SFCW) for imaging a collection of point scatterers. When 
the channel is “seen through” a pulse p{t), as is often the case 
for channel estimation in communication applications, then 
the noiseless frequency domain signal is given by Y (/) = 
^ simple extension of our algorithm 
to handle weighted sinusoids applies to this setting. We do 
not provide detailed discussion here due to lack of space, but 
the code that we have made available HI does provide the 
required flexibility. 

In this paper, we are interested in the setting where K is 
a small integer, so that the underlying signal has a sparse 
representation in the atomic set of unit norm sinusoids. The 
goal is to find the best sparse approximation of y using 
atoms in the dictionary of sinusoids. A promising algorithm in 
sparse approximation theory is Orthogonal Matching Pursuit 
(OMP) 0, El, a greedy algorithm that iteratively identifies 
the atom that yields the greatest improvement in approximation 
quality. If the structure of the atomic set is “simple,” so that 
identifying the “best” atom in each iteration needs a small 
amount of computation, then OMP becomes computationally 
attractive. Unfortunately, for our atomic set, searching over a 
continuum of atoms is not possible. An approximation that 
is often employed to overcome this problem is to discretize 
the set of frequencies, hoping that the signal still admits a 
sparse representation in this finite set. However, as discussed 
in a, no matter how finely we grid the parameter space, the 
true underlying atoms of y need not lie on the grid. This 
“off-grid effect,” or basis mismatch, degrades the performance 
of reconstruction algorithms significantly. The authors in 151 
propose inserting a gradient-based local search in a Matching 


2 


Pursuit 0 framework, as a means of alleviating the off-grid 
effect. In this paper, we go one step further by incorporat¬ 
ing a Newton-based Cyclic Refinement step into the OMP 
framework: this not only sidesteps the off-grid effect, but also 
enhances performance by refining previously estimated atoms 
in each iteration. 

A. Contributions 

Our key contributions are as follows: 

(1) We propose Newtonized OMP (NOMP), which detects 
the best atom over a discrete grid, but avoids basis mismatch 
by adding a Newton refinement step, thus emulating pursuit 
over the continuum. In addition, we go beyond OMP by 
locally refining all estimated atoms in each iteration, thus re¬ 
evaluating estimates of previously detected sinusoids to incor¬ 
porate the effect of the newly detected sinusoid. This second 
refinement step can be interpreted as feedback presented to the 
atoms we have already detected. We show that this feedback 
mechanism plays a crucial role in handling interference among 
the underlying atoms of y, yielding estimation accuracies 
far better than what would be possible by standard greedy 
pursuit algorithms (e.g., OMP and Matching Pursuit). We do 
not require explicit estimates of model order, and provide a 
stopping criterion based on CFAR (constant false alarm rate) 
based on an estimate of the noise variance. 

(2) We prove convergence of NOMP by providing upper 
bounds on the number of iterations. Moreover, we derive a 
bound on the convergence rate, showing that by choosing a 
fine enough grid for detection, the bound approaches the rate 
of convergence for OMP on the continuum. 

(3) We show that the algorithm is near-optimal by numerical 
comparisons against the Cramer Rao Bound (CRB) 171 in a 
variety of settings. When the frequencies of the sinusoids in 
the mixture are well-separated, NOMP is able to achieve the 
CRB for any sinusoid that has an SNR greater than a certain 
threshold. Moreover, NOMP is able to resolve closely-spaced 
frequencies with near-optimal accuracy, as long as there is 
enough disparity in the SNRs of different sinusoids. 

(4) We analyze the computational complexity of NOMP. Our 
numerical experiments show that its run time is significantly 
smaller than that of recently proposed state-of-the-art Atomic 
norm Soft Thresholding (AST) algorithm 0, and is even 
smaller than that of the classical MUSIC algorithm Q. We 
evaluate the run time of the algorithm in a variety of settings. 
A freely downloadable software package implementing the 
proposed algorithm can be found in m. 

B. Related work 

Line spectral estimation is a fundamental problem in sta¬ 
tistical signal processing. Classical (and popular) subspace 
methods such as MUSIC and ESPRIT ||9l, IITOl exploit the 
low-rank structure of the autocorrelation matrix. One of the 
major advantages of these methods is the capability of resolv¬ 
ing multiple closely-spaced frequencies at high SNR. Both 
MUSIC and ESPRIT have been shown to be asymptotically 
optimal in the limit of infinite SNR mu, but their perfor¬ 
mance degrades at medium and low SNRs. Another family of 


DET-based classical methods mu, mu, typically have lower 
computational complexity and estimation accuracy similar to 
that of subspace methods ITSl . llT4ll . 

More recent techniques using convex optimization cast the 
frequency estimation problem as that of finding a sparse 
approximation of the received signal using an infinite¬ 
dimensional dictionary of sinusoids. It is shown in IfTSlI that, 
in the absence of noise, total-variation norm is able to locate 
frequencies with infinite precision, as long as the minimum 
frequency separation exceeds 4 x Adu where Adu = 27r/A 
is the Discrete Eourier Transform (DET) grid spacing. The 
sufficient condition on required minimum separation has been 
recently improved to 2.52 x Adft in ifThl . An extension to noisy 
scenarios is provided in HU- Another approach is Atomic 
norm Soft Thresholding (AST) 0, lITSl . which provides 
theoretical guarantees of noise robustness in terms of mean 
squared error (MSE). Both total-variation norm and atomic 
norm are generalizations of the fi norm to infinite-dimensional 
settings. Solving these optimization problems involves solving 
the Lagrange dual which takes the form of a semi-infinite 
program (SIP) with finite-dimensional decision variables and 
infinitely many constraints. Eor the sparse frequency estima¬ 
tion problem, ini and 0 reformulate the dual as a semidef- 
inite program (SDP), which enables numerical optimization. 
Similar reformulation for other problems seems to be difficult 
na. A pragmatic approach is to use Lasso optimization on 
a highly oversampled grid as an approximation for AST 0, 
na. Both AST and Lasso are benchmarks that we compare 
our proposed algorithm against in our numerical experiments. 

NOMP can be viewed as coordinate optimization over a 
continuum. Coordinate-wise descent has been widely used for 
sparse approximation; for example, such methods are compet¬ 
itive for solving Lasso type problems lfT9ll . Il20ll . Preliminary 
results in lITTl show that coordinate descent to a relaxation of 
the AST problem can be a means to speed up implementation. 

There is a large body of work on feedback for improving 
the performance of iterative greedy algorithms for sparse 
approximation ll22l . Il23l . Il24ll . Il25ll . 1261, 1271 : information 
from recent iterations is used to remove errors introduced 
by previous steps. Eor example, l2^ introduces a forward- 
backward greedy algorithm that allows aggressive backward 
steps (discarding small-magnitude atoms) after a greedy for¬ 
ward step. In contrast, NOMP, by virtue of its continuous 
parameterization, can employ a mild form of feedback by 
locally refining the set of estimated atoms, thus allowing an 
atom to be replaced by a nearby, highly correlated atom, in 
the continuum. The idea of embedding a local refinement step 
in OMP has been proposed in l28l in a discrete setting, but 
this algorithm does not address basis mismatch, and assumes 
that the model order is known a priori. 

Many theoretical results on OMP 0, 123, 1^ are based 
on assumptions such as incoherence or Restricted Isometry 
Property (RIP) for the underlying dictionary. Such approaches 
do not work for analyzing NOMP, since nearby atoms are 
highly correlated for continuous frequency estimation. Instead, 
our convergence analysis borrows tools from analysis of AST- 
based line spectral estimation 0, along with observations 
following from our CEAR-based design. 
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For a single sinusoid, frequency estimation using coarse 
detection followed by Newton refinement was proposed three 
decades ago by Abatzoglou ED. This was recently adapted 
for estimation of a single delay in EHj shown to approach 
estimation-theoretic bounds. In prior work involving some of 
the authors, we have used sequential algorithms similar to 
NOMP for millimeter wave spatial channel estimation with 
compressive measurements ES, E31- The NOMP algorithm 
in the present paper is an improvement on these, with a 
principled CFAR-based stopping criterion. We present NOMP 
within an application-independent abstraction, recognizing the 
fundamental nature and widespread utility of the frequency 
estimation problem. We have reported on a version of NOMP 
in a recent conference paper ll^ (although the algorithm in 
ESl omits a least squares step included here for establishing 
convergence rate results, but with little impact on practical 
performance). The present paper goes beyond ESl in pro¬ 
viding a detailed convergence analysis, and a comprehensive 
comparison with the state of the art. 

A closely related algorithm to NOMP is proposed in EH, 
which employs a Bayesian framework for frequency esti¬ 
mation, using Newton refinements for updating the frequen¬ 
cies. The details are different from our non-Bayesian, CFAR 
framework, and convergence analysis is not provided, but the 
benefits of Newtonization are also evident in the numerical 
results in ESi- 

Outline: We present our algorithm in SectionlH] In Sectionlllll 
we present the CFAR-based stopping criterion, together with a 
simplified analytical characterization of false alarm and miss 
probabilities. In Section IIVI we show why oversampling is 
essential for discretization followed by refinement to emulate 
pursuit over the continuum. We discuss convergence in Section 
0 In Section IVIl we report on numerical experiments, and 
compare NOMP with other methods in terms of estimation 
accuracy and computational complexity. Section Ivnl discusses 
extension of the algorithm to more general settings, with 
illustrative numerical results for compressive measurements. 
Section IVIIII concludes the paper. 

We set N = 256 throughout in our numerical results (other 
values of N yield entirely similar trends). 

Notation: Complex conjugate transpose of v is denoted by 
. 5R{a} is the real part of complex number a. The Moore- 
Penrose pseudo-inverse of matrix A is denoted by . The 
distance between any two frequencies {cu/jWfc} is defined by 
distjwfejO;;) = minaGZ — uJi + 27ra|, i.e. the wrap-around 
distance when we restrict frequencies to lie in [0,27r). The 
DFT matrix with unit norm columns and the corresponding 
grid spacing are denoted by T and Adft, respectively. The 
inner product between v,u G is defined as {v, u) = u^v. 

II. NOMP Algorithm 

We first discuss estimation of a single sinusoid, and then 
build on it to generalize to a mixture of sinusoids. 

A. Single Frequency 

We have y = px(a;) -f z. The Maximum Likelihood 
(ML) estimate of the gain and frequency are obtained by 


minimizing the residual power ||y — (;x(a;)||^, or equivalently, 
by maximizing the function 

S{g,uj) = 2U{y^gx{u})} - \g\^ ||x(w)f . (3) 

Directly optimizing S{g,uj) over all gains and frequencies 
is difficult. Therefore, we adopt a two stage procedure; (1) 
Detection stage, where we find a coarse estimate of uj by 
restricting it to a discrete set, (2) Refinement stage, in which 
we iteratively refine gain and frequency estimates. 

For any given uj, the gain that maximizes S{g,uj) is given 
hy g = (x(a;)^y)/ ||x(a;)||^. Substituting g in S{g,uj) yields 
that the generalized likelihood ratio test (GLRT) estimate of 
UJ (treating g as a nuisance parameter) is the solution to the 
following optimization problem; 

a} = argmax Gy{uj), (4) 

UJ 

where 

G'y(w) = |x(a;)^y|Yl|x(a;)f (5) 

is the GLRT cost function. We use this observation to find a 
coarse estimate of {g, uj) in the Detection stage. 

Detection: We obtain a coarse estimate of uj by restricting 
it to a finite discrete set denoted by 17 A {k{2Tr/jN) : 
k = 0,1,{'-fN — 1)}, where 7 is the over-sampling 
factor relative to the DFT grid. For our simulation results, 
we set 7 = 4 . The outputs of this stage are Wc G that 
maximizes the GLRT cost function (|4|i, and the corresponding 
gain (x(we)^y)/||x(a;c)f. 

Refinement: Since uj can take any value in interval [0, 27r), 
we add a Newton-based refinement stage for estimation on the 
continuum. Let {g, uj) denote the current estimate. The Newton 
step for frequency refinement is given by 

uj'= UJ - S{g,uj)/S{g,uj) ( 6 ) 

where 

S{g,uj) = ^{{y - gx{uj))^g{dx{uj)/duj)} (7) 

S{g,uj) = 5R{(y-5x(u;))^5((i^x(a;)/dw^)} (8) 

-\gf\\djc{uj)/dujf. 

As we want to maximize S{g,uj), we only apply the update 
rule (|6]l when the function is locally concave (i.e. S{g, uj) < 0). 
The gain parameter is then updated to maximize S{g,uj'): 

5' = (x(A')^y)/||x(;u')f- 

Refinement Acceptance Condition (RAC): We accept a 
refinement only if it leads to a strict improvement in Gy{uj)\ 
that is, if Gy(uj') > Gy{uj). This ensures that an accepted 
refinement can only decrease the overall residual energy, 
and that the residual energy is non-increasing throughout the 
course of the algorithm, which ensures convergence, as shown 
in Section IV] 

B. Multiple Frequencies 

Let V — {{gi,uJi) , / = 1,..., fc} denote a set of estimates 
of the parameters of the sinusoids in the mixture. Let 

l=k 

yr(T’) = y-^pix(wi) 


(9) 
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denote the residual measurement corresponding to this esti¬ 
mate. The following procedure is a direct generalization of the 
single sinusoid refinement algorithm to multiple frequencies. 
It proceeds by employing the single sinusoid algorithm to 
perform Newtonized coordinate descent on the overall residual 
energy ||yr(7^)||^- One step of this coordinate descent involves 
cycling through all sinusoids in P in a predetermined order. In 
this process, suppose that we wish to refine the ^-th sinusoid; 
we treat y^{'P\{{gi,uJi)}) as our measurement y and employ 
the single frequency update step to refine {gi,oji). 


Algorithm 1 Newtonized Orthogonal Matching Pursuit 
1 : Procedure EXTRACTSPECTRUM(y, r): 

2 : m ^ 0, Vo = {} 

3: while maxi^gDFT GyjCp^)(a;) > t do 
4: m 4— m + 1 

5: IDENTIEY 

w = argmax<^6n 

and its corresponding gain 

g^ (x(w)^yr(T’m-r))/||x(w)f 

^ O {(5, w)} 

7: Single Refinement: Refine (p,(D) using single fre¬ 

quency Newton update algorithm (Rg Newton steps) to 
obtain improved estimates {g’,Lu’). 

^ ^m-l O {( 5 ', w')} 

9: Cyclic Refinement; Refine parameters in V^ 

one at a time; For each {g,uj) G V^ we treat 
yr{V^ \ {{g,uj)}) as the measurement y, and apply 
single frequency Newton update algorithm. We perform 
Rc rounds of cyclic refinements. Let V'^ denote the new 
set of parameters. 

10: Update all gains in 7^"' by least squares; 

X = [x(wi)... x(a;m)], {w;} are the frequencies in V!^ 

[gi ■ --gruf = Arty 

Let Vm denote the new set of parameters. 

II: return Vm 


The NOMP procedure is summarized in Algorithm [T] We 
now briefly discuss the role of its main components: 

• Single Refinement (Stepl?]); Ideally, we want to identify 
w which maximizes the GLRT cost function over the contin¬ 
uum. The Single Refinement step emulates search over the 
continuum by locally refining the estimate of w obtained by 
picking the maximum over the discrete set fl. 

• Cyclic Refinement (Step |9]) is where NOMP diverges 
from forward greedy methods llJTll (in particular OMP) by 
^mVidmgfeedback for local refinements of previously detected 
sinusoids. This gives them an opportunity to better explain 
the received signal in light of new information regarding the 
presence of another sinusoid. This feedback is presented in the 
form of an updated residue. As we see in Sections [V-Cl and lVll 
this step is crucial for fast convergence and high estimation 


accuracy. 

• Update by least squares (StepfToli: Here we update gains by 
projecting the received signal y onto the subspace spanned by 
the estimated frequencies. This ensures that the residual energy 
is the minimum possible for the current set of estimated fre¬ 
quencies. We see in Section |V] that performing this projection 
step just prior to detecting a new sinusoid enables us to lower 
bound the convergence rate of NOMP by mirroring arguments 
used to establish bounds on OMP convergence iJTl . 

We have left the number of refinement steps unspecified so 
far. For the simulations in this paper, we set Rg = 1 for every 
newly detected sinusoid in the SINGLE Refinement step, and 
i?c S {1,3,5} refinement rounds in the CYCLIC Refinement 
step, depending on the difficulty of the estimation problem. 

C. Complexity analysis 

We analyze the computational complexity of the main 
steps of NOMP assuming that the algorithm runs for exactly 
K iterations (i.e., perfect stopping). Checking whether the 
stopping criterion is satisfied is efficiently implemented 
using Fast Fourier Transform (FFT), with complexity 
0{KNlog{N)). The IDENTIFY Step involves computing the 
GLRT cost function over the set of frequencies defined by Cl. 
This can also be computed using FFTs in 0{yKNlog{'yN)) 
time. The SINGLE Refinement Step takes only 0{RsN) 
operations per sinusoid, hence the total cost for SINGLE 
Refinement is 0{RgKN). The Cyclic Refinement 
involves refining all frequencies that have been estimated so 
far, and has overall complexity 0{RcRsK^N). If we directly 
compute the pseudo-inverse and apply it to the vector of 
observation (i.e., X^y) in the UPDATE Step, the complexity 
is 0{NK^ + K^) per iteration, and the overall cost is 
0{NK^ K^). However, we note that iterative methods 

(such as Richardson’s iteration or conjugate gradient f22\ ) are 
extremely efficient in computing the least squares solution 
and can be used for speeding up the UPDATE Step. Empirical 
observations suggest that the CYCLIC Refinement step 
dominates the overall computational cost of NOMR 

Minimum frequency separation: When two frequencies, say 
wi and 012 , are “very close”, intuitively, the mixture 5 ix(a;i)-|- 
g 2 ^{oJ 2 ) is explained “very well” by a single frequency, say as 
{gi +t 72 )x(a;i). Thus, a natural metric to characterize regimes 
for testing algorithms for mixture frequency estimation is the 
minimum frequency separation between any two sinusoids. We 
denote this by Awmin = min^^/dist(a;fc, w/) and we would 
like our algorithms to work well even for small values of 

AWniin. 

Without a minimum frequency separation condition, the 
estimation problem can be hopelessly ill-posed. This has 
been studied in detail in ifTSl using Slepian’s work on prolate 
spheroidal sequences ll38l . It has been shown that if the 
frequencies are clustered together, it becomes impossible 
for any method to recover the information form the noisy 
observations. It is important to note that in the limit of infinite 
SNR, however, one can still estimate a sparse clustered set of 
frequencies regardless of their separation (e.g., using Prony’s 
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method of polynomial interpolation |[39l). 


Estimation Theoretic Bounds: Estimation theoretic quantities 
such as Cramer Rao Bound (CRB) and Ziv-Zakai Bound 
(ZZB) Pol provide lower bounds on the variance of estima¬ 
tors. For single frequency estimation, these bounds are given 


by Ell 


CRB {SNR) = 


6 

SNR X {N^ 


1 )’ 


( 10 ) 


and 


ZZBiSNR) = / Q 


ISNRl 1- 


sin(7Vh/2) 


N sin(/i/2) 


h dh. 
( 11 ) 


In Figure [T] we plot estimation error bounds as a function 
of SNR = ||x(w)||^/cr^ = Xja^. The ZZB has a dis¬ 
tinct threshold behavior: large frequency estimation errors 
are inevitable in a “low SNR” regime below a threshold, 
while the ZZB converges to the CRB when the SNR is large 
enough compared to this threshold. In Section IVll we compare 
algorithms in terms of high SNR behavior relative to the CRB, 
and low SNR behavior relative to the ZZB threshold. Note that 
even if per sinusoid SNRs are higher than the ZZB threshold, 
the joint estimation problem for multiple sinusoids may still 
be ill-posed (e.g., if the frequency separation is too small). 

Remark 1: We have defined “integrated SNR” obtained by 
dividing the total power of a sinusoid by the noise power per 
complex dimension, i.e., SNR = An alternative 

definition of signal to noise ratio is the “per-sample SNR”, 
given by SNRsampie = = jjSNR. For example, 

SNR = 25 dB (which is the nominal SNR value in our 
simulations), corresponds to SNRsampie ~ 1 dB- 



Fig. 1: CRB and ZZB for estimating the frequency of a single 
sinusoid. 


III. CFAR-based Stopping Criterion 

Detection problems face a tension between false alarm 
and missed detection (or miss, for short). In many detection 
problems, a model for the signal can be elusive. Thus, a 
common strategy is based on the Constant False Alarm Rate 
(CFAR) criterion ll42l . Il43l , which only requires a model for 
noise. Here, we use the CFAR criterion to estimate model 
order (i.e. number of sinusoids in the mixture K): if the 


residual signal can be explained “well enough” by noise, up to 
a target false alarm rate, then we stop. We show by simulation 
that the actual false alarm rate is close to the nominal being 
designed for. 

We also estimate the probability of miss, taking into account 
the effect of noise but ignoring “interference” from other si¬ 
nusoids. The resulting receiver operating characteristic (ROC) 
turns out to be in remarkable agreement with simulations. 
This shows that, when the sinusoids are separated beyond a 
minimum separation, and when their SNRs exceed the ZZB 
thresholds for individual sinusoids, then the probabilities of 
false alarm and miss are dominated by noise rather than by 
inter-sinusoid interference. 

A. Stopping Criterion 

The algorithm terminates when 

Gy,(P)M = Ky,-(T’),x(w))|^ < T 

for all DFT frequencies {27rfc/A^ : fc = 0,...,A^—1}. In 
other words, we stop when 

ll-^yr(^)llL <T 

where Rsl is the Discrete Fourier Transform of a, and report 
V as our estimate of the sinusoids in the mixture. 

Suppose that we have already correctly detected all sinu¬ 
soids in the mixture. In this case, the residual is yr('P) ~ z, 
where z ^ CA/'(0, (since is a projection matrix, the 

statistics of WGN are unchanged by it). It is easy to show that 

Pr {|| J-yr(iP)|lL > r} = 1 - (1 - exp(-T/a2))'^ . (12) 

We choose our stopping criterion threshold r so that 
Pr |||J^yr(P)||^ > r| = P^, where Pfa is a nominal false 
alarm rate. Using (O, we can explicitly compute this thresh¬ 
old as 

T = -cr^ log (^1 - (1 - Pfa)^/^) . 

A more easily interpreted expression can be obtained via 
asymptotics for large N 1441 (which provide excellent ap¬ 
proximations for the moderate values of N used in our 
numerical results). Fet Mn = ||-Pyr(P)||^. If yr(P) ~ z, we 
have E[MAr] = I ~ cr^logW, and the asymptotic 

distribution of P = Mjq — cr^ log N is given by Pr{P < t} = 
exp(— exp(—a;/cr^)). We set t — log{N) + x, for x so that 
Pr {E > x} is equal to the nominal false alarm rate Pfa. This 
is given by x = —cr^ log log (1/(1 — Pfa)) and the resulting 
expression for the threshold r is 

T = \og{N) - log log (1/(1 - Pfa)) . (13) 

Figure |2] plots measured versus nominal false alarm rates 
for different values of nominal Pfa. Each point in the plot 
is generated by 300 runs of NOMP algorithm for estimating 
frequencies in a mixture of AT = 16 sinusoids of fixed nominal 
SNR. The minimum frequency separation Awmin = 2.5Adft. 
We declare a false alarm whenever NOMP overestimates the 
model order K. As shown in Figure |2] the empirical false 
alarm rate closely follows the nominal value at various SNRs. 
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Fig. 2: Nominal v.s. measured probability of false alarm. 


Fig. 3: Measured v.s. computed ROCs 


B. Probability of Miss 

Define neighborhood around each true frequency uji by 
= {w : dist(a;, Wi) < 0.25 x Adft}. We declare successful 
detection of uji if at least one of the estimated frequencies lies 
in otherwise we declare a miss for oji. A miss results 
from noise and inter-sinusoid interference, but we only model 
noise here. For the minimum separation considered here, we 
show using simulations that the empirical probability of miss 
is only a little higher than the analytical estimate that we derive 
below. 

A sinusoid of amplitude A leads to a maximum FFT 
value of a A, where a £ [0,1] captures the amplitude re¬ 
duction due to the grid mismatch. The magnitude of the 
maximum FFT coefficient, denoted by Mfft, is Rician with 
Pr{Mfft < x} = 1 — Qi ^ where Qi is Marcum 

Q-function. The sinusoid is not detected by the algorithm if 
A/fft < \/t, hence 

Fmiss = 1 - Qi (as/2SNR, s/2^'^ . (14) 

Assuming uniform distribution for a frequency within a DFT 
grid interval gives us E[a] = E[sin(Aa;/2)/(Asin(w/2))] = 
0.88, where w ^ Uniform]— tt/A, tt/A]. Therefore, 

Fmiss ~ 1 - Qi (0.88V25Ai?, x/2r/a2) . (15) 

Putting equations (ITbT i and (ITSl l together, we can characterize 
the ROC at various SNRs as shown in Figure |3] The simulation 
parameters are the same as Figure |2] .We see that for SNR= 
18 dB the probability of miss is negligible. When SNR goes 
below the ZZB threshold e.g. SNR= 14 dB, the probability of 
miss is bounded away from zero. This behavior is predicted by 
the ZZB threshold and our ROC analysis for single frequency 
estimation. However, as we see in Figure |3| they serve as 
excellent approximations for multiple frequency estimation. 

Remark 2: For the simulations in this paper, we have used 
the CFAR-based stopping criteria for the NOMP algorithm. 
However, a variety of other stopping rules, e.g., Bayesian In¬ 
formation Criteria (BIC) Il45l and Akaike Information Criteria 
(Aic) m, can be easily adapted for use with the NOMP 
algorithm. In Section FVI-DI we investigate the performance of 
the BIC stopping rule (see Appendix I for a quick overview) 
as well as CFAR-based stopping criteria for NOMP. 


IV. The Need to Oversample 


In this section, we show that oversampling is indeed re¬ 
quired at the detection stage, in order for Newton refinements 
to converge to the maximum of the GLRT cost function. We 
ignore noise in these discussions. The GLRT cost function 
is given by Gy{uj) = \ J2iLi ~ where h{uj) = 

^”siii^(^/ 2 ) Dirichlet kernel. Characterizing the minimum 

required oversampling factor for this general setting is difficult 
because of the highly non-convex nature of Gy{uj). Hence, we 
focus on a single frequency setting (K = 1). 

We wish to arrive at uji by optimizing Gy{uj) us¬ 
ing Newton refinements starting on a coarse grid. We 
note that argmax^^ Gy(a;) = argmax^j — wi)p = 

argmax;^ \h{uj — wi)p. We would like to characterize the 
minimum oversampling factor such that if we start off from 
the best guess of the maximum of Gyfuj) on the grid, the 
Newton refinement stage will take us to uii. That is, we must 
characterize how close to wi must the nearest grid point lie, so 
that Newton refinements will always take us to from this 
grid point. Without loss of generality, we set wi = 0 (since 
we have shifted our frequency axis such that wi = 0, no grid 
point may lie on 0). 

We start by normalizing frequencies by the DFT spacing. 
In this scaled frequency axis, the Dirchelet kernel is given by 
h{x) = jv sininxjN) ’ where x = w/Adu is sometimes referred 
to as the normalized frequency. As shown in 1471 . the Newton 
method converges to the solution of h'{x) = 0 quadratically 
if the initial guess xq lies in an interval I around the true 
solution where the following conditions are met: 

• h''{x) f 0 Vx £ /. 

• h"'(x) is finite \/x £ I. 


• jxoj < 1/M where M = sup,^gj0.5 
Figure |4| shows the function h{x 


h'" {x 


h" (tc) 

and its derivatives in a 
window around the origin. The first two conditions are met 
for any interval I C (—1,1). Therefore, it only remains to 
satisfy the third condition. By simple algebra one can see that 
if we set I = (-0.45,-1-0.45), then we have jxoj < 1/M for 
any xq £ I. Therefore the maximum acceptable grid spacing 
is about 0.9 which is equivalent to minimum oversampling 
factor Ri 1.12. This simple analysis shows that, even for a 
single sinusoid and no noise, we must sample beyond the 
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- 2-1012 - 2-1012 

Fig. 4: Dirichlet kernel h(x) and its derivatives. 

DFT grid to ensure that the two-stage detection/refinement 
procedure successfully identifies the maximum of the GLRT 
cost function, thereby imitating pursuit over the continuum. 
Our simulations show that setting the oversampling factor at 
7 = 4 or more works very well, independent of the number 
of the sinusoids in the observations. 

V. Convergence 

In this section, we first characterize convergence by provid¬ 
ing upper bounds on the number of iterations of NOMP. We 
then provide a bound on the rate of convergence of NOMP as 
a function of the “atomic norm” of y, and the oversampling 
factor 7 . Our convergence results are pessimistic, in that they 
do not account for the effect of the rehnement steps. We show 
by simulations the dramatic improvement due to rehnements, 
by comparing NOMP to the following variants of OMP: 
Discretized OMP (DOMP); This is standard OMP applied 
to the oversampled grid of sinusoids O. We use our NOMP 
implementation with number of refinement steps set to zero, 
increasing the oversampling factor to 7 = 20. Since DOMP 
can be interpreted as a special case of NOMP, the convergence 
analysis presented here is valid for DOMP as well. 

NOMP without Cyclic Refinements (NOMP-): If we skip 
the Cyclic Refinement step of NOMP, then we get an 
algorithm that emulates OMP over the continuum of atoms. 
Note that NOMP- does not have a feedback mechanism, hence 
it lies in the class of forward greedy methods. Our convergence 
analysis also holds here. 

A. Proof of Convergence: 

A trivial upper bound on the number of iterations of NOMP 
is the number of the observations N. This is a direct result 
of solving the least squares at Step [TO] of the algorithm. After 
N iterations, Ai is a square full-rank matrix (no frequency 
is detected twice), hence the residue is zero for {N -f 1)*^ 
iteration and the algorithm terminates. 

The following theorem states another upper bound on the 
number of iterations, obtained by characterizing the amount 


by which the residual energy decreases when adding a new 
frequency to the set of estimated sinusoids. 

Theorem 1: The reduction of residual energy due to one 
iteration (adding a new sinusoid) is at least r. Consequently, 
min{W, ||y||^ /t} is an upper bound on the number of itera¬ 
tions of the algorithm. 

Proof: The residue at m* iteration of the algorithm is given 
by yr(^m) = y - The energy of the residue 

in each iteration of the algorithm satisfies the following, 

||yi-(T’m-i)||^ *= ||yi-(T’m)|| + G'y7-p^_j)(w) 

> \\'yii'Pm)\\ + 

> ||yi-(T’m)||^ + G'y7-p^_^)(a>) (16) 

(d) „ 

>\\yr{rm)f+T. (17) 

where (a) follows from Step |5] of the algorithm where we 
project yr(Pm-i) orthogonal to the subspace spanned by x(w) 
to get yi{V[yf). Inequalities in (b) follow from (RAC) checks 
performed whenever the single frequency refinement algorithm 
is invoked and (c) from the fact that least squares UPDATE can 
only lead to a decrease in energy of the residual signal, (d) is 
a direct consequence of the stopping criteria of the algorithm. 

Inequality (E) shows that the reduction of the residual 
energy due to detecting a new sinusoids is always greater than 
T. This result bounds the number of iterations of the algorithm 
from above by Hyll^y^T, proving convergence. Combining 
this observation with the trivial upper-bound N completes the 
proof. 

B. Rate of Convergence: 

We first bound the maximum of the GLRT cost function 
Gy{u!) over the continuum of frequencies, in terms of that 
over the oversampled grid. To this end, we borrow ideas used 
in jS) (Appendix C) for proving a similar property for the dual 
atomic norm. We briefly introduce the notion of atomic norm 
(also known as dictionary norm) llJTl and specialize to the line 
spectral estimation problem. 

The atomic set of unit norm sinusoids is given by 
A = : (j),uj G [0, 27r)}. The atomic norm for y is 

defined by 

l|ylU = inf {f > 0 : y e f conv(yl)} . (18) 

where conv(yl) denotes the convex hull of the points in A. 
Since the centroid of the conv(yl) is at the origin, the atomic 
norm can be rewritten as HSl . 1491 

l|y|lA-inf|^|cfi| : y = ^Pix(a;z), x(a;i) G Alj . 

(19) 

Note that this is not the ii norm of y, but the £i norm on the 
coefficients of the representation of y by elements of A. The 
atomic norm is typically small when its argument has a good 
sparse approximation El. The dual norm of H-H^ is defined 
by 

||y||^ = sup5i{(a,y)}. (20) 

a&A 






It is easy to see that 


l|y|Q = sup sup K{e*‘^(x(a;),y)} 

u;G[ 0 , 27 r) 0 G[O, 27 r) 

= sup |(x(w),y)| 

a;G[0,27r) 

= sup JCyiuj). ( 21 ) 

i^G[ 0 , 27 r) * 

Directly borrowing from IS] gives us the following Theorem. 


Theorem 2: la Maximizing the GLRT cost function (for 
the dictionary of unit norm sinusoids) over [ 0 , 27r) is consistent 
with that over the oversampled grid 17 with oversampling 
factor 7 . That is, we have 

max A /Gy (w) < sup \ Gy{oj) (22) 

V ajG[ 0 , 2 ir) * 

^ (i-t) (23) 

See (ISl, Appendix C) for a proof. 

We need the following lemma to prove Theorem [3 which 
provides a pessimistic characterization of the convergence rate. 

Lemma 1: Assume {a„}„>o is a decreasing sequence of 
nonnegative numbers such that ao < U and 

an < an-1 (1 - -^) > Vn > 0, 

then we have a„ < for all n > 0 . 

The proof is by induction llJTl . Suppose a„-i < Either 
an-i < in which case a„ < or a„_i > in 
which case 


< 

1—1 

1 

e 

(Xn 

1 

< 

-fi- 

U ' 
n+1 


n [ 

u 


U 

n + 1' 


Hence, a„ < for all n > 0. 

Theorem 3: For all y such that ||y||^ < 00 , the residual 
energy of NOMP at the m*^ iteration satisfies 

||yr(P™)||<(m + l)-i/2(^l-^) ||y||^. (24) 

Proof: From (fThl l. we have 

\\y.{Vm)\? < \\y.{Vm-i)\? - Gy, )((£;). (25) 

yi{Vm-i) is the result of projecting y orthogonal to the 
subspace spanned by Vm-i, therefore 


\\yr{Vm-l)f 


= {yri.'Pm-l),y) 

< l|ylU llyr(^m-l)ll^ 

= l|ylU sup J Gy,(-p^_,){w) 

wG[0.27I-) ^ 

(b) / 27r\“^ /- 

^ llylU (^ “ f) “In V 


where (a) follows by Holder’s inequality Il50l . and (b) is by 

Theorem ID Let ry = ||y||^ ^1 — ■ From Step |3 of the 

algorithm we have 



hence. 


Ilyr+m-lf < ?7Y^Gy,(p„._^)(w). 

(26) 

Combining (l25ll and (l26ll. gives 


\\yXVm)f < ||yr+„.-i)f (1 - ||yr+™-l)f) 

■ (27) 

Using Lemma [Hand the fact that 


||yr+o)f = ||yf <||y||^<+, 

(28) 

we have 

l|yr++f 

TO + 1 

(29) 

In other words. 


||yr+m)|| < (to + l)-l/2 ^1 - ||y|+ . 

(30) 


This proves the Theorem. 

For the simulations we set 7 = 4, but it is worth mentioning 
that, since we employ FFTs for detection over the oversampled 
grid, increasing 7 has marginal effect on the runtime of 
the algorithm. In fact setting 7 = 20 leads to only about 
5% increase in runtime. If we compare (l30l) to the rate of 
convergence of OMP over the continuum given by Ell, 

||y|P™)f <(m + l)-||y||^, (31) 

we see that by choosing 7 large enough, the bound on the 
convergence rate specified in (l30l l approaches that of OMP 
over the continuum. Note that, in the derivation of (l30l l we 
have not considered the effect of the refinement steps, but 
imposing RAC ensures that refinements can only help speed 
up convergence. 


C. Empirical rate of convergence: 

We use numerical simulations to highlight the convergence 
benefits of the refinement steps in NOMP compared to the 
bound specified in (l30l l. We plot the mean residual energy 
(averaged over 1000 runs) as a function of the number of 
iterations in a noiseless setting. We set K = 16, and Awmin = 
2.5 X Adft. Figure |5] shows that light oversampling (7 = 4) 
followed by SINGLE Refinement step (NOMP-) leads to a 
better convergence rate than having a large oversampling factor 
(7 = 20) and no refinements (DOMP). On the other hand, 
NOMP enjoys an extremely fast convergence rate due to the 
Cyclic Refinement step. In fact we see that for the setting 
where Awmin is fairly large, the residual energy essentially 
drops to machine precision after 16 iterations, which equals 
the number of sinusoids in the mixture. 
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VI. Simulation Results 

Our performance measure is the mean squared error (MSB) 
of frequency estimation, and we compare the performance of 
NOMP against a number of benchmarks in various settings. 
Benchmarks: The MUSIC algorithm is implemented using a 
modified version of the Matlab routine rootmusic. The 
modifications are two-fold: (1) we use Minimum Description 
Length (MDL) criterion im for estimating the number of 
sinusoids in the mixture; (2) MUSIC operates by constructing 
an estimate of the autocorrelation matrix of the observed data 
vector. To this end, we use a sliding window of size W, 
to generate multiple snapshots from the observation vector 
y £ C^. The choice of W has a significant impact on the 
performance of the algorithm, both in terms of estimation 
accuracy and run time: W too large leads to an inaccurate 
estimate of the autocorrelation matrix and the signal subspace, 
whereas too small a value effectively reduces the size of the 
observation window in time, degrading frequency estimation 
accuracy. We have found (empirically) that setting lU = 96 
results in the best estimation accuracy for the nominal settings 
in our simulations. 

For sparse convex optimization, we consider Atomic norm 
Soft Thresholding (AST) ||8l, ll52l . We use the alternating 
direction method of multipliers (ADMM) ll5?i to implement 
AST, as suggested in JS]. The updates in ADMM are iterative 
(typically 100 ~ 200 iterations for each simulation run), and 
each iteration includes an eigenvalue thresholding step for an 
(N -f 1) X (N + 1) matrix. This 0{N^) step dominates the 
computational cost of the ADMM method, and becomes very 
expensive for large N. 

The authors in JS] suggest solving Lasso as an alternative 
to the semi-definite program induced by AST. Lasso is solved 
on an oversampled frequency grid, using the highly optimized 
£2 — £1 software package SpaRSA ||5^ . We set the tolerance 
parameter to be 10“^ (other than this, we use the default 
parameters): smaller values of the tolerance parameter (e.g. 
10 “"^) increase runtime significantly, while providing marginal 
performance improvement. The regularization parameter in 
AST and Lasso formulation, suggested in [jS], is set to 

reg = CT ^1 -f y/log N + log(47rlog A^). 


The oversampling factor for the Lasso solver is set to 10 in 
our simulations. While increasing the oversampling factor 
improves Lasso performance, as mentioned in M, for an 
oversampled grid, the frequencies estimated by Lasso cluster 
around the true frequencies. In order to avoid drastically 
overestimating model order, we implement a simple clustering 
scheme. In our simulations, we group frequencies into the 
largest number of clusters possible so that the frequency 
separation between any two sinusoids in different clusters 
is no smaller than 0.25 x Adft. After clustering, we update 
the gains of the cluster centers by solving the least squares 
problem minimize{g,} ||y - I]; 5 /x(a;i)||^. 

Newtonized Lasso (NLasso): We also compare the results of 
NOMP algorithm with an extension of the Lasso formulation. 
In this scheme, we first apply the Lasso solver to identify the 
frequencies over the highly oversampled grid. Then we run the 
Cyclic Refinement step of the NOMP algorithm in order 
to refine the estimated frequencies, in order to prevent error 
floors caused by the off-grid effect. The parameters of Lasso 
are unchanged and the number of refinement steps is set to 5. 
Our simulations show that, for well-separated frequencies, the 
refinement step significantly improves estimation accuracy 
while incurring a small increase in runtime compared 
to Lasso. However, the benefit of refinement for Lasso 
diminishes as we increase the difficulty of the estimation 
problem (small Awmin). 

Simulation set-up: We consider a mixture of IT = 16 
sinusoids of length N = 256. We perform 300 simulation 
runs for each of the four scenarios characterized by AtUmin 
and SNR values. The settings for different scenarios are 
summarized in Table |I] 


Scenarios 

SNR (dB) 

^^min/ ^dft 

1 

SNRnon, 

2.5 

2 

SNRnom 

0.3 

3 

Uniform [15,35J 

2.5 

4 

Uniform[15, 35J 

0.3 


TABLB I: Settings of different Scenario 


In Scenarios 1 and 2, the nominal SNR for each sinusoid is 
set as SNRnom = 25 dB, whereas for Scenarios 3 and 4, the 
SNR values are chosen uniformly from [15, 35] dB, with mean 
equal to the nominal SNR of 25 dB. In each simulation run, the 
gain magnitudes are set to \gi\ — crv^SNRj, while the phases 
{^gi} are chosen uniformly from [0, 27r). The frequencies are 
chosen uniformly at random from [0, 27r)^ while respecting 
the minimum separation constraints specified by Awmin (if the 
minimum separation criterion is not met, we sample again 
from [0, 27r)^). We plot the Complementary Cumulative Dis¬ 
tribution Function (CCDF) of the squared frequency estimation 
error for all algorithms, along with the CRB (also a random 
variable, since it differs across realizations), and also compare 
against the DFT spacing, which is the resolution provided by 
coarse peak picking. See Appendix II for a quick overview of 
Cramer Rao Bound. The parameters of NOMP algorithm are 
set according to Table HJ for different scenarios. 
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NOMP 

Scenario 1 

Scenario 2 

Scenario 3 

Scenario 4 

He 

1 

3 

1 

3 

Rs 

1 

1 

1 

1 

7 

4 

4 

4 

4 


TABLE II; NOMP parameters at different scenarios. 



Squai'ed frequency estimation error in dB 
(relative to DFT) 


(a) 



(b) Zoomed in 

Fig. 6; CCDF of the frequency MSB for Scenario 1. 


A. Frequency estimation accuracy 

1) Distribution of error: Let us first examine the CCDF of 
squared frequency estimation error in each scenario. Figure 
|6] shows that NOMP, AST and NLasso lead to very similar 
error distributions in Scenario 1, while outperforming the other 
methods. Unlike Lasso and DOMP, which suffer from the off- 
grid effect, MUSIC picks frequencies over the continuum, and 
achieves better estimation accuracy. Another observation is 
that if the frequencies are well separated (as in Scenario 1), 
then adding a refinement stage at the output of Lasso leads 
to a significant improvement in estimation accuracy. As we 
move to more difficult Scenarios, however, the performance 
of NLasso degrades compared to NOMP and AST as shown 
in Ligure |7] for Scenario 2. The reason is that the refinement 
stage of NLasso is able to improve the estimation accuracy 
only when the initial estimates provided by Lasso are close to 
the true frequencies. When Lasso fails in providing good initial 


estimates, there is no benefit in locally refining the frequencies. 

Ligure [S] shows the distribution of error in Scenario 3. 
We see that the overall gap in the performance of different 
algorithms is reduced compared to the first two scenarios, with 
AST, NOMP and NLasso still achieving the highest estimation 
accuracy. In Scenario 4, NOMP achieves superior performance 
compared to all of the other methods, as shown in Ligure |9l 



Squared frequency estimation error in dB 
(relative to DFT) 


(a) 



(b) Zoomed in 

Lig. 7: CCDL of the frequency MSB for Scenario 2. 

2 ) Mean squared error: Here we examine the performance 
of different algorithms in terms of frequency estimation accu¬ 
racy by looking at the normalized Mean Squared Brror (MSB), 
defined by Wfojtrue — different scenarios. In 

Scenarios 1 and 3, where frequencies are well-separated, one 
hopes to get to the estimation accuracy of a single sinusoid 
(as if the other sinusoids did not exist). We therefore use the 
CRB and ZZB corresponding to a single sinusoid, computed 
by ® and respectively, as measures of optimality. In 
Scenarios 2 and 4, where frequencies can get close to one 
another, we compute the CRB empirically for each realization 
of the problem, and employ the mean CRB as a measure of 
optimality. 

Ligure [TO] shows the MSB of frequency estimation in 
Scenarios 1 and 2, for SNRnom taking values from 13 dB to 35 
dB. In Scenario 1, if the nominal SNR is high enough, AST, 
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Squared frequency estimation error in dB 
(relative to DFT) 



Squared frequency estimation error in dB 
(relative to DFT) 


(a) 


(a) 




(b) Zoomed in (b) Zoomed in 

Fig. 8; CCDF of the frequency MSB for Scenario 3. Fig. 9: CCDF of the frequency MSB for Scenario 4. 


NOMP, and NBasso all achieve the CRB. As we decrease the 
nominal SNR, all of the algorithms exhibit threshold behavior, 
well predicted by the ZZB threshold. The threshold SNR of 
AST is lower than that of other methods, showing its noise 
resilience. MUSIC does not achieve the CRB, but closely 
follows the bound for all SNR values in this scenario. DOMP 
and Basso on the other hand, reach performance floors. We 
examine this floor more closely in Section lVI-CI to see whether 
it is a fundamental algorithmic limitation, or happens due to 
the off-grid effect. Bigure fTOl b corresponds to Scenario 2, 
where the separation between frequencies can be very small. 
We see that MUSIC is the only algorithm that benefits from 
increasing the nominal SNR. This goes back to the asymptotic 
optimality of MUSIC: for SNR -)■ oo and K N, MUSIC 
is able to precisely determine the frequencies in the mixture, 
regardless of the separation between them M. 

In Scenarios 3 and 4, the SNRs are drawn independently and 
uniformly at random from the interval [15, 35] dB. In order to 
evaluate the frequency MSB in these scenarios, we fix the 
SNR of one of the sinusoids in the mixture at a given value, 
while letting the other (AT — 1) SNRs to be realized randomly. 
Bigure fTTI -a shows the frequency MSB curves corresponding 
to Scenario 3. As we expected from error distribution plots in 


Bigure |8] the gap in the performance of different algorithms 
has decreased compared to the first scenario. Bigure fTTI -b 
corresponds to Scenario 4, in which NOMP outperforms all 
of the other algorithms, and tightly follows the CRB. This 
indicates that NOMP is highly successful in exploiting the 
disparity in SNRs across sinusoids in the mixture, in order 
to estimate closely-spaced frequencies. AST achieves the best 
performance at very low SNRs; however, its MSB curve stays 
bounded away from the CRB, with an expanding gap as we 
increase SNR. 

3) Number of cycles of Newton refinement (Rfi: We have 
seen that NOMP is able to achieve the CRB in Scenarios 1 and 
3, with one cycle of refining the sinusoids in each iteration, 
i.e., Rc = 1. In this subsection, we want to highlight the 
effect of increasing Rc in improving the frequency estimation 
accuracy of NOMP in Scenarios 2 and 4 where Awmin = 
0.5 X Adft. Bigure [T2| shows the frequency MSB of NOMP 
for Rc G {1, 3, 5}. We see that NOMP enjoys the benefits of 
having more rounds of refinement, but exhibits diminishing 
return as Rc increases. In particular, increasing Rc beyond 3 
cycles gives marginal improvement in estimation accuracy. 
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(a) Scenario 1 



Fig. 10; Normalized frequency MSE for Scenarios 1 and 2. 


B. Computation time 

Table [111] summarizes the time needed for running 300 sim¬ 
ulations for each of the algorithms in different scenarios. We 
see that DOMP is extremely fast at the expense of estimation 
accuracy. NOMP is faster than all of the other methods in all 
four Scenarios, while achieving remarkable frequency estima¬ 
tion accuracy. As we discussed in Section Hl-CI the CYCLIC 
Refinement step has the complexity 0{RcRsK'^N), and 
dominates the computational cost of NOMP. Table |IV] shows 
the time needed for 300 simulation runs of NOMP for different 
values of Re- Note that as we increase the difficulty of the 
estimation scenario, AST, Lasso and NLasso tend to take more 
time, while MUSIC and NOMP are unaffected. 


I'lme [secj 

NoMp 

AST 

NLasso 

Lasso 

DOMP 

MUSIC 

Scenario 1 

6.92 

1.09e3 

29.68 

26.63 

2.66 

19.83 

Scenaiio 2 

14.26 

1.02e3 

29.63 

27.62 

2.81 

20.15 

Scenario 3 

s:8S~ 

1.12e3 

34.32 

32.06 

T79 

20.07 

Scenario 4 

14.19 

1.18e3 

36.37 

33.71 

2.80 

19.69 


TABLE III: Time [sec] for 300 runs of each algorithm. 
Parameters of NOMP are set by Table [H] 



(a) Scenario 3 



(b) Scenario 4 


Fig. 11; Normalized frequency MSE for Scenarios 3 and 4. 


Time [sec] 

Rc = 1 

Rc = 3 

Rc = 5 

Scenaiio 1 

6.92 

14.24 

21757 

Scenario 2 

7.01 

14.26 

21.71 

Scenario 3 

6.88 

14.21 

21.61 

Scenaiio 4 

6.96 

14.19 

2T3(1 


TABLE IV: Time [sec] for 300 runs of NOMP algorithm, for 
different numbers of cyclic refinements Re- 


C. Asymptotic regime 

It is interesting to see the effect of increasing the over- 
sampling factor for Lasso and DOMP. Eigure [13] corresponds 
to Scenario 1, with the change being that the oversampling 
factor for these two algorithms is increased. We observe an 
improvement in the performance of both algorithms in terms 
of estimation accuracy, with that for Lasso being especially 
significant. Comparing these results to those in Figure [Toj-a, 
we see that the MSE performance of DOMP is marginally 
improved by increasing 7 from 10 to 100. This performance 
plateau shows a fundamental algorithmic limitation of DOMP, 
and highlights the critical role of cyclic Newton refinements in 
NOMP. In other words, the performance limitation of DOMP 
is not just due to the off-grid error, but also a consequence of 
making “hard-decisions” at each iteration. The computational 
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(b) Scenaiio 4 


Fig. 12: Performance improvement of NOMP with increasing 
the number of cyclic refinements in Scenarios 2 and 4. 

complexity of DOMP is insensitive to oversampling factor. For 
example, computation time increases from 3.25 to about 7.96 
seconds as we go from 7 = 20 to 7 = 100 . 

In theory, Lasso is only limited by the oversampling factor 
7, and as 7 00, the result of Lasso converges to that of AST, 

as a consequence of the convergence of the corresponding 
atomic norms 0 . In practice, however, the performance of 
Lasso solved on a moderate size grid might be far from that 
of AST. Figure [13] shows that, at large enough oversampling 
factors. Lasso approaches the CRB, but the computational cost 
becomes prohibitive. For example, the computation time of 
Lasso for 300 runs, increases from 72.16 seconds to 191.22 
seconds as we go from 7 = 20 to 7 = 50. 

D. Model order estimation 

Estimating the model order K (the true number of non¬ 
zero atoms in the mixture) has significant importance in sparse 
approximation. Here we examine the Cumulative Distribution 
Function (CDF) of the estimated model order by different 
algorithms. As shown in Figure [T4| both AST and MUSIC 
perform well, with MUSIC performance slightly degrading 
in scenarios with small Awmin. Note that the mean of the 



Fig. 13: Frequency MSE for Scenario 1 and highly over¬ 
sampled grid for Lasso and DOMP. 

distributions are very close to the truth (small bias), and they 
have a small spread around the mean (small variance). Eigure 
[ 15 ] shows the model order estimates for NOMP using both 
CEAR and BIC-based stopping criteria. We see that both 
criteria are very accurate in all four scenarios. 

On the other hand. Lasso and DOMP perform poorly in 
estimating the model order, especially in scenarios with small 
Awmin- As shown in Eigure (Th] DOMP tends to overestimate 
the model order when some of the frequencies are placed too 
closely. This is again the result of a fundamental algorithmic 
limitation. DOMP does not allow for correcting errors that 
have happened in the previous iterations; instead, it tries to 
explain the residual energy by overestimating the number of 
non-zero atoms. 

In Scenario 2, Lasso tends to underestimate the model order. 
The main reason is that for two closely spaced frequencies. 
Lasso generates two overlapping clusters of estimated frequen¬ 
cies, which are later replaced by a single frequency by our 
clustering algorithm. On the other hand, if we do not employ 
clustering. Lasso significantly overestimates the model order. 

VII. Extensions of the Algorithm 

In this section we point out an immediate extension 
of NOMP algorithm. Specifically, we can replace the 
manifold of sinusoids {x(a;) : w G [0,27r)} by 

{Ax(a;)/II Ax(a;)|| : u G [0, 27r)}, where A G is a 

known measurement matrix. This is motivated by the follow¬ 
ing measurement setup: 

K 

y = '^9iAx{uJi). (32) 

1=1 

Compressive measurements 

We consider a compressive measurement model in which 
the number of measurements M <C N. As in the bulk 
of literature on compressive sensing, we assume that the 
elements of A are chosen i.i.d from appropriate zero- 
mean distributions (with variance conveniently scaled to I/A 
such as Uniform{±I/-\/]V}, Uniform{±I/\/]V, ±//\/]V}, 
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Fig. 14; CDF of the estimates of the model order for AST and 
MUSIC. 


Fig. 16: CDF of the estimates of the model order for Lasso 
and DOMP with 7 = 20. 
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Fig. 15; CDF of the estimates of the model order for NOMP 
using CFAR and BIC criteria. 


Af{0^1/N)^ etc.,) so that certain concentration results hold. 
It has been shown in BH that when A satisfies certain 
isometry conditions (related to the estimation problem at 
hand), the CRB and ZZB are approximately preserved for 
compressive estimation, except for an SNR degradation of 
M/N due to randomly projecting down the signal to a smaller 
space. The number of compressive measurements needed to 
estimate continuous frequencies scales as M = 0{K log N) 
ED. In order to get concrete numerical intuition, we set 
M = N/A = QA with the elements of A chosen uniformly and 
independently at random from {±l/vV, and run 

NOMP with atoms {s(a;) = Ax(w)/ ||Ax(w)|| : w S [0,27r)}. 
We consider Scenario 1 with Rc = 3, with number of sinusoids 
set to iC = 13 and K = 16. Our algorithm approaches the 
CRB in the setting where K = 13, whereas we incur large 
estimation errors when K — 16; see Figure [TD The large 
estimation errors for iT = 16 occur because the compressive 
measurement matrix A fails to preserve the structure of the 
estimation problem: M = 6A compressive measurements is 
too few for K = 16 sinusoids. 

VIII. Conclusions 

We have shown that NOMP is fast and near-optimal for fre¬ 
quency estimation in AWGN. It performs better than classical 
methods such as MUSIC, and more recent convex optimization 
based methods such as AST, in terms of both estimation accu¬ 
racy and run time. The algorithm uses a fundamental element 
of OMP, ensuring that the residue is orthogonal to the signal 
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Squared frequency estimation error in dB 
(relative to DFT) 

Fig. 17: CCDF of the frequency MSB for Scenario 1 with 
Compressive measurements. 


space spanned by the current set of frequencies. However, 
NOMP avoids the error floors of naively discretized OMP by 
refinement over the continuum. Specifically, it searches for a 
signal subspace in the “neighborhood” of our current signal 
space which can better explain the observed measurements. 
The algorithm has a natural “decision feedback” interpretation 
in which it gives the already detected sinusoids a chance to 
adjust their frequencies in light of new evidence which is 
presented in the form of an updated residue after we add the 
next sinusoid. 

We believe that the ideas underlying NOMP are broadly 
applicable to problems of sparse approximation over contin¬ 
uous dictionaries, but further work is required to justify this 
assertion. An open problem is to go beyond the pessimistic 
convergence rate estimates provided here by analytically quan¬ 
tifying the benefits of refinement. 
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Appendix I: Bayesian Ineormation Criteria 

The BIC balances an increase in the likelihood with the 
number of parameters used to achieve that increase. Namely, 

BIC = 21n - (^2 -TOi)ln(W), (33) 

Li(0i, . . . , (pmi) 

where Li{cj)i,..., (j)mi) and L 2 { 0 i,..., 0 ^ 2 ) are likelihood 
functions and and {dj}Y=i are their corresponding 

parameters. For our measurement model, y = J^iLi + 

z, where z ^ A/'(0, we have 

L{{9h^i}f=i) = 

(ttct ) 

where = y — the residual. Therefore, the 

BIC criterion will be as follows: 

BIC = 4A||y.||"-21n(7V), (34) 


where AHy^lP = ||yr(o/d)|p — ||yr(neu>)|p is the reduction 
in residual energy after detecting a new sinusoid. If BIC > 10, 
it is a strong evidence that the most recent reduction in the 
residual energy corresponds to a newly detected sinusoid. 
Therefore, when BIC < 10, we stop the algorithm. 


Appendix II: Cramer Rao Bound 


In this Appendix we first review the Cramer Rao Bound |2l 
for an unbiased estimator in a general setting, then specialize 
to the frequency estimation problem. Let a G If a^0(y) 
is an unbiased estimator of a^0, then the variance of the es¬ 
timator given by Ey|e 


(a'^0(y) - a'^ 0 ) 


is lower bounded 


by a^f ^(0)a, where F{6) is the Fisher Information Matrix. 
The (m,n)* element of F{6) is given by 


. p fainp(y|0)91np(y|0)] 

For parameter estimation in additive white Gaussian noise i.e. 
y = s{0) + z, z ^ CJ\f{0, cr^I), equation dTSl l simplifies to. 


M = 


ds{6)Y 


(36) 


For our frequency estimation problem in (|2|i, 9 is the vector 
of all parameters, namely {\gi\, Zgi,uJi : I = and 

— Sti We form F{6) for this measure¬ 

ment model then choose those diagonal elements of F~^{9) 
that correspond to the frequencies {oji ■. I — 1,..., K}. 
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